Multimodal Emotion Recognition Based on the Decoupling of Emotion and Speaker Information

نویسندگان

  • Rok Gajsek
  • Vitomir Struc
  • France Mihelic
چکیده

The standard features used in emotion recognition carry, besides the emotion related information, also cues about the speaker. This is expected, since the nature of emotionally colored speech is similar to the variations in the speech signal, caused by different speakers. Therefore, we present a gradient descent derived transformation for the decoupling of emotion and speaker information contained in the acoustic features. The Interspeech ’09 Emotion Challenge feature set is used as the baseline for the audio part. A similar procedure is employed on the video signal, where the nuisance attribute projection (NAP) is used to derive the transformation matrix, which contains information about the emotional state of the speaker. Ultimately, different NAP transformation matrices are compared using canonical correlations. The audio and video sub-systems are combined at the matching score level using different fusion techniques. The presented system is assessed on the publicly available eNTERFACE’05 database where significant improvements in the recognition performance are observed when compared to the stat-of-the-art baseline.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross-Subject Continuous Emotion Recognition Using Speech and Body Motion in Dyadic Interactions

Dyadic interactions encapsulate rich emotional exchange between interlocutors suggesting a multimodal, cross-speaker and cross-dimensional continuous emotion dependency. This study explores the dynamic inter-attribute emotional dependency at the cross-subject level with implications to continuous emotion recognition based on speech and body motion cues. We propose a novel two-stage Gaussian Mix...

متن کامل

MEMN: Multimodal Emotional Memory Network for Emotion Recognition in Dyadic Conversational Videos

Multimodal emotion recognition is a developing field of research which aims at detecting emotions in videos. For conversational videos, current methods mostly ignore the role of inter-speaker dependency relations while classifying emotions. In this paper, we address recognizing utterance-level emotions in dyadic conversations. We propose a deep neural framework, termed Multimodal Emotional Memo...

متن کامل

A Predictive Model for Emotion Recognition Based on Individual Characteristics and Autonomic Changes

Introduction: The importance of individual differences in the problem of emotion recognition has been repeatedly stated in the studies. The major concentration of this study was the prediction of heart rate variability (HRV) changes due to affective stimuli from the subject characteristics. These features were age (A), gender (G), linguality (L), and sleep (S) information. In addition, the most...

متن کامل

Analysis and recoding of multimodal data

Emotions are part of our lives. Emotions can enhance the meaning of our communication. However, communication with computers is still done by keyboard and mouse. In this humancomputer interaction there is no room for emotions, whereas if we would communicate with machines the way we do in face-to-face communication much information can be extracted from the context and emotion of the speaker. W...

متن کامل

Automated Recognition of Paralinguistic Signals in Spoken Dialogue Systems: Ways of Improvement

The ability of artificial systems to recognize paralinguistic signals, such as emotions, depression, or openness, is useful in various applications. However, the performance of such recognizers is not yet perfect. In this study we consider several directions which can significantly improve the performance of such systems. Firstly, we propose building speakeror gender-specific emotion models. Th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010